[misc][torchair] fix bugs around `deepseek mtp`, `enable_shared_expert_dp` and `use_cached_kv_cache_bytes` #3074

linfeng-yuan · 2025-09-21T19:23:41Z

What this PR does / why we need it?

This miscellaneous contains several small fixes:

fix initialization and forward bugs of DeepseekMTPLayer with shared_expert_dp enabled.
fix a tensor shape mismatches after o_proj caused by a work-aroud change in NPUModelRunner.
avoid unnecessary decline of kv_cache memory (default: 64MB) with use_cached_kv_cache_bytes disabled.
fall back fused_moe_state from MC2 to All2All since the padding logic of mc2_mask is incompatible with input hidden_states when shared_expert_dp enabled.

Once this PR is merged, users can launch disaggregated_prefill deployments (large_ep) with deepseek_mtp and shared_expert_dp as v0.9.1-dev branch. The remaining problem of kv_cache tokens decline compared to v0.9.1-dev will be resolved by #3073.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

E2E vllm serving about deepseek_mtp with torchair graph mode and enable_shared_expert_dp with eager mode. Large ep deployments are also tested with this PR.

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@5aeb925

github-actions · 2025-09-21T19:23:51Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a bug in DeepSeek's Multi-Token Prediction (MTP) when enable_shared_expert_dp is active. The core of the fix involves conditionally using TorchairDeepSeekMTP for this configuration. Additionally, the logic for handling attention metadata has been updated to be more robust by avoiding a hardcoded dictionary key. My review points out a potential crash in this new metadata handling and provides a suggestion for a safer implementation.

vllm_ascend/torchair/models/torchair_deepseek_v2.py

Signed-off-by: linfeng-yuan <1102311262@qq.com>

…ange in NPUModelRunner Signed-off-by: linfeng-yuan <1102311262@qq.com>

…_cache_bytes disabled Signed-off-by: linfeng-yuan <1102311262@qq.com>

gemini-code-assist · 2025-09-22T06:37:47Z

Hello! You've invoked me with /gemini. If you intended to use a specific command, please specify it after /gemini. For a list of available commands and how to use them, please refer to the help message by typing /gemini help or checking the "Using Gemini Code Assist" section in the PR description.

linfeng-yuan · 2025-09-22T18:05:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces several targeted fixes for bugs related to deepseek_mtp, enable_shared_expert_dp, and KV cache memory management. The changes correctly adjust MoE state handling and metadata synchronization for shared_expert_dp, prevent an unnecessary reduction in available KV cache memory when use_cached_kv_cache_bytes is disabled, and improve the robustness of accessing attention metadata. The fixes appear correct and well-aligned with the stated objectives. I have no major concerns with this pull request.

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan · 2025-09-23T04:59:45Z

@wangxiyuan I pushed a new commit to tackle the last problem with large ep deployments and fix the UT break. Please review this commit and check whether there is any blocking problem to merge this pr.

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

vllm_ascend/torchair/models/torchair_deepseek_v2.py Outdated Show resolved Hide resolved

[bugfix] fix deepseek mtp with enable_shared_expert_dp

b359070

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan force-pushed the fix_deepseek_mtp_with_shared_expert_dp branch from 9669e03 to b359070 Compare September 21, 2025 19:34

linfeng-yuan added 2 commits September 22, 2025 13:46

[fix] fix a tensor mismatches after o_proj casued by a work-around ch…

f0a96e7

…ange in NPUModelRunner Signed-off-by: linfeng-yuan <1102311262@qq.com>

[fix] avoid unnecessary decline of kv cache memory with use_cached_kv…

5de847a

…_cache_bytes disabled Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan changed the title ~~[bugfix] fix deepseek mtp with enable_shared_expert_dp~~ [misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes Sep 22, 2025

linfeng-yuan added ready read for review ready-for-test start test by label for PR labels Sep 22, 2025

linfeng-yuan removed ready read for review ready-for-test start test by label for PR labels Sep 22, 2025

wangxiyuan approved these changes Sep 22, 2025

View reviewed changes

linfeng-yuan added ready read for review ready-for-test start test by label for PR labels Sep 22, 2025

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

[fix] fix mismatch shapes between mc2_ep and shared_expert_dp

6fd21e8

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan force-pushed the fix_deepseek_mtp_with_shared_expert_dp branch from 1d27f71 to 6fd21e8 Compare September 23, 2025 04:07

github-actions bot added the module:tests label Sep 23, 2025

wangxiyuan approved these changes Sep 23, 2025

View reviewed changes

wangxiyuan merged commit d01fd1d into vllm-project:main Sep 23, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[misc][torchair] fix bugs around `deepseek mtp`, `enable_shared_expert_dp` and `use_cached_kv_cache_bytes` #3074

[misc][torchair] fix bugs around `deepseek mtp`, `enable_shared_expert_dp` and `use_cached_kv_cache_bytes` #3074

Uh oh!

linfeng-yuan commented Sep 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

linfeng-yuan commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

linfeng-yuan commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

[misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes #3074

[misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes #3074

Uh oh!

Conversation

linfeng-yuan commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

linfeng-yuan commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

linfeng-yuan commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

[misc][torchair] fix bugs around `deepseek mtp`, `enable_shared_expert_dp` and `use_cached_kv_cache_bytes` #3074

[misc][torchair] fix bugs around `deepseek mtp`, `enable_shared_expert_dp` and `use_cached_kv_cache_bytes` #3074

linfeng-yuan commented Sep 21, 2025 •

edited

Loading